328 research outputs found

    Balancing lists: a proof pearl

    Get PDF
    Starting with an algorithm to turn lists into full trees which uses non-obvious invariants and partial functions, we progressively encode the invariants in the types of the data, removing most of the burden of a correctness proof. The invariants are encoded using non-uniform inductive types which parallel numerical representations in a style advertised by Okasaki, and a small amount of dependent types.Comment: To appear in proceedings of Interactive Theorem Proving (2014

    An Efficient Algorithm For Chinese Postman Walk on Bi-directed de Bruijn Graphs

    Full text link
    Sequence assembly from short reads is an important problem in biology. It is known that solving the sequence assembly problem exactly on a bi-directed de Bruijn graph or a string graph is intractable. However finding a Shortest Double stranded DNA string (SDDNA) containing all the k-long words in the reads seems to be a good heuristic to get close to the original genome. This problem is equivalent to finding a cyclic Chinese Postman (CP) walk on the underlying un-weighted bi-directed de Bruijn graph built from the reads. The Chinese Postman walk Problem (CPP) is solved by reducing it to a general bi-directed flow on this graph which runs in O(|E|2 log2(|V |)) time. In this paper we show that the cyclic CPP on bi-directed graphs can be solved without reducing it to bi-directed flow. We present a ?(p(|V | + |E|) log(|V |) + (dmaxp)3) time algorithm to solve the cyclic CPP on a weighted bi-directed de Bruijn graph, where p = max{|{v|din(v) - dout(v) > 0}|, |{v|din(v) - dout(v) < 0}|} and dmax = max{|din(v) - dout(v)}. Our algorithm performs asymptotically better than the bidirected flow algorithm when the number of imbalanced nodes p is much less than the nodes in the bi-directed graph. From our experimental results on various datasets, we have noticed that the value of p/|V | lies between 0.08% and 0.13% with 95% probability

    Cerulean: A hybrid assembly using high throughput short and long reads

    Full text link
    Genome assembly using high throughput data with short reads, arguably, remains an unresolvable task in repetitive genomes, since when the length of a repeat exceeds the read length, it becomes difficult to unambiguously connect the flanking regions. The emergence of third generation sequencing (Pacific Biosciences) with long reads enables the opportunity to resolve complicated repeats that could not be resolved by the short read data. However, these long reads have high error rate and it is an uphill task to assemble the genome without using additional high quality short reads. Recently, Koren et al. 2012 proposed an approach to use high quality short reads data to correct these long reads and, thus, make the assembly from long reads possible. However, due to the large size of both dataset (short and long reads), error-correction of these long reads requires excessively high computational resources, even on small bacterial genomes. In this work, instead of error correction of long reads, we first assemble the short reads and later map these long reads on the assembly graph to resolve repeats. Contribution: We present a hybrid assembly approach that is both computationally effective and produces high quality assemblies. Our algorithm first operates with a simplified version of the assembly graph consisting only of long contigs and gradually improves the assembly by adding smaller contigs in each iteration. In contrast to the state-of-the-art long reads error correction technique, which requires high computational resources and long running time on a supercomputer even for bacterial genome datasets, our software can produce comparable assembly using only a standard desktop in a short running time.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

    Simple Formula for Nuclear Charge Radius

    Full text link
    A new formula for the nuclear charge radius is proposed, dependent on the mass number (A) and neutron excess (N-Z) in the nucleus. It is simple and it reproduces all the experimentally available mean square radii and their isotopic shifts of even--even nuclei much better than other frequently used relations.Comment: The paper contains 7 pages in LateX and 6 figures (available upon request) in postscript. Email: [email protected]

    Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts

    Full text link
    We study the approximate string matching and regular expression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that leads to algorithms improving the previously known complexities for both problems. In particular, we significantly improve the space bounds, which in practical applications are likely to be a bottleneck

    Fast Searching in Packed Strings

    Get PDF
    Given strings PP and QQ the (exact) string matching problem is to find all positions of substrings in QQ matching PP. The classical Knuth-Morris-Pratt algorithm [SIAM J. Comput., 1977] solves the string matching problem in linear time which is optimal if we can only read one character at the time. However, most strings are stored in a computer in a packed representation with several characters in a single word, giving us the opportunity to read multiple characters simultaneously. In this paper we study the worst-case complexity of string matching on strings given in packed representation. Let m≀nm \leq n be the lengths PP and QQ, respectively, and let σ\sigma denote the size of the alphabet. On a standard unit-cost word-RAM with logarithmic word size we present an algorithm using time O\left(\frac{n}{\log_\sigma n} + m + \occ\right). Here \occ is the number of occurrences of PP in QQ. For m=o(n)m = o(n) this improves the O(n)O(n) bound of the Knuth-Morris-Pratt algorithm. Furthermore, if m=O(n/logâĄÏƒn)m = O(n/\log_\sigma n) our algorithm is optimal since any algorithm must spend at least \Omega(\frac{(n+m)\log \sigma}{\log n} + \occ) = \Omega(\frac{n}{\log_\sigma n} + \occ) time to read the input and report all occurrences. The result is obtained by a novel automaton construction based on the Knuth-Morris-Pratt algorithm combined with a new compact representation of subautomata allowing an optimal tabulation-based simulation.Comment: To appear in Journal of Discrete Algorithms. Special Issue on CPM 200

    DooSo6: Easy Collaboration over Shared Projects

    Get PDF
    International audienceExisting tools for supporting parallel work feature some disadvantages that prevent them to be widely used. Very often they require a complex installation and creation of accounts for all group members. Users need to learn and deal with complex commands for efficiently using these collaborative tools. Some tools require users to abandon their favourite editors and impose them to use a certain co-authorship application. In this paper, we propose the DooSo6 collaboration tool that offers support for parallel work, requires no installation, no creation of accounts and that is easy to use, users being able to continue working with their favourite editors. User authentication is achieved by means of a capability-based mechanism

    Faster Approximate String Matching for Short Patterns

    Full text link
    We study the classical approximate string matching problem, that is, given strings PP and QQ and an error threshold kk, find all ending positions of substrings of QQ whose edit distance to PP is at most kk. Let PP and QQ have lengths mm and nn, respectively. On a standard unit-cost word RAM with word size w≄log⁥nw \geq \log n we present an algorithm using time O(nk⋅min⁥(log⁥2mlog⁥n,log⁥2mlog⁥ww)+n) O(nk \cdot \min(\frac{\log^2 m}{\log n},\frac{\log^2 m\log w}{w}) + n) When PP is short, namely, m=2o(log⁥n)m = 2^{o(\sqrt{\log n})} or m=2o(w/log⁥w)m = 2^{o(\sqrt{w/\log w})} this improves the previously best known time bounds for the problem. The result is achieved using a novel implementation of the Landau-Vishkin algorithm based on tabulation and word-level parallelism.Comment: To appear in Theory of Computing System

    Can we avoid high coupling?

    Get PDF
    It is considered good software design practice to organize source code into modules and to favour within-module connections (cohesion) over between-module connections (coupling), leading to the oft-repeated maxim "low coupling/high cohesion". Prior research into network theory and its application to software systems has found evidence that many important properties in real software systems exhibit approximately scale-free structure, including coupling; researchers have claimed that such scale-free structures are ubiquitous. This implies that high coupling must be unavoidable, statistically speaking, apparently contradicting standard ideas about software structure. We present a model that leads to the simple predictions that approximately scale-free structures ought to arise both for between-module connectivity and overall connectivity, and not as the result of poor design or optimization shortcuts. These predictions are borne out by our large-scale empirical study. Hence we conclude that high coupling is not avoidable--and that this is in fact quite reasonable

    Brane Interaction as the Origin of Inflation

    Full text link
    We reanalyze brane inflation with brane-brane interactions at an angle, which include the special case of brane-anti-brane interaction. If nature is described by a stringy realization of the brane world scenario today (with arbitrary compactification), and if some additional branes were present in the early universe, we find that an inflationary epoch is generically quite natural, ending with a big bang when the last branes collide. In an interesting brane inflationary scenario suggested by generic string model-building, we use the density perturbation observed in the cosmic microwave background and the coupling unification to find that the string scale is comparable to the GUT scale.Comment: 28 pages, 8 figures, 2 tables, JHEP forma
    • 

    corecore